Quality Assessment of the Affymetrix U133A&B Probesets by Target Sequence Mapping and Expression Data Analysis

نویسندگان

  • Yuriy L. Orlov
  • Jiangtao Zhou
  • Leonard Lipovich
  • Atif Shahab
  • Vladimir A. Kuznetsov
چکیده

Careful analysis of microarray probe design should be an obligatory component of MicroArray Quality Control (MACQ) project [Patterson et al., 2006; Shi et al., 2006] initiated by the FDA (USA) in order to provide quality control tools to researchers of gene expression profiles and to translate the microarray technology from bench to bedside. The identification and filtering of unreliable probesets are important preprocessing steps before analysis of microarray data. These steps may result in an essential improvement in the selection of differentially expressed genes, gene clustering and construction of co-regulatory expression networks. We revised genome localization of the Affymetrix U133A&B GeneChip initial (target) probe sequences, and evaluated the impact of erroneous and poorly annotated target sequences on the quality of gene expression data. We found about 25% of Affymetrix target sequences overlapping with interspersed repeats that could cause cross-hybridization effects. In total, discrepancies in target sequence annotation account for up to approximately 30% of 44692 Affymetrix probesets. We introduce a novel quality control algorithm based on target sequence mapping onto genome and GeneChip expression data analysis. To validate the quality of probesets we used expression data from large, clinically and genetically distinct groups of breast cancers (249 samples). For the first time, we quantitatively evaluated the effect of repeats and other sources of inadequate probe design on the specificity, reliability and discrimination ability of Affymetrix probesets. We propose that only functionally reliable Affymetrix probesets that passed our quality control algorithm (approximately 86%) for gene expression analysis should be utilized. The target sequence annotation and filtering is available upon request.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unexpected presence of mycoplasma probes on human microarrays.

Features The contamination of cell cultures by mycoplasmas poses a problem in that it can adversely affect cellular behavior and physiology; this is exacerbated by the fact that colonization is not easy to detect (1). In a previous study investigating the effects on microarray data, it was shown that mycoplasma contamination can alter patterns of human gene expression by upsetting host cell phy...

متن کامل

A sequence-based identification of the genes detected by probesets on the Affymetrix U133 plus 2.0 array

One of the biggest problems facing microarray experiments is the difficulty of translating results into other microarray formats or comparing microarray results to other biochemical methods. We believe that this is largely the result of poor gene identification. We re-identified the probesets on the Affymetrix U133 plus 2.0 GeneChip array. This identification was based on the sequence of the pr...

متن کامل

Identification of Shortened 3′ untranslated Regions from Expression Arrays

Cancer cells have been recently shown to express high level of short 3'UTR isoforms that can escape miRNA-mediated regulation. We present here a computational procedure for systematically identifying shortened 3'UTRs by Affymetrix 3' microarrays. The advantage of this technology compared to more recent and promising ones such as exon arrays and RNA-Seq is that, giving the relatively small cost,...

متن کامل

Widespread existence of uncorrelated probe intensities from within the same probeset on Affymetrix GeneChips

We have developed a computational pipeline to analyse large surveys of Affymetrix GeneChips, for example NCBI's Gene Expression Omnibus. GEO samples data for many organisms, tissues and phenotypes. Because of this experimental diversity, any observed correlations between probe intensities can be associated either with biology that is robust, such as common co-expression, or with systematic bias...

متن کامل

Text S6

We chose to not include samples from various platforms and technologies (e.g. other Affymetrix models, Agilent microarrays, RNA-seq) because of their inherent differences in sample preparation steps, hybridization chemistry, probeset/primer length and sequences, data preprocessing techniques, and so forth – all of which lead to poor correlation of the same features, making them not as readily c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • In silico biology

دوره 7 3  شماره 

صفحات  -

تاریخ انتشار 2007